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Abstract 


Trajectories provide dynamical information that is discarded in free energy calculations, for which 
we sought to design a scheme with the hope of saving cost for generating dynamical informa¬ 
tion. We first demonstrated that snapshots in a converged trajectory set are associated with implicit 
conformers that have invariant statistical weight distribution (ISWD). Based on the thought that in¬ 
finite number of sets of implicit conformers with ISWD may be created through independent con¬ 
verged trajectory sets, we hypothesized that explicit conformers with ISWD may be constructed 
for complex molecular systems through systematic increase of conformer fineness, and tested the 
hypothesis in lipid molecule palmitoyloleoylphosphatidylcholine (POPC). Furthermore, when ex¬ 
plicit conformers with ISWD were utilized as basic states to define conformational entropy, change 
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of which between two given macrostates was found to be equivalent to change of free energy ex¬ 
cept a mere difference of a negative temperature factor, and change of enthalpy essentially cancels 
corresponding change of average intra-conformer entropy. These findings suggest that entropy 
enthalpy compensation is inherently a local phenomenon in configurational space. By implicitly 
taking advantage of entropy enthalpy compensation and forgoing all dynamical information, con¬ 
structing explicit conformers with ISWD and counting thermally accessible number of which for 
interested end macrostates is likely to be an efficient and reliable alternative end point free energy 
calculation strategy. 
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Introduction 


For two arbitrary macrostates A and B visited in a set of converged molecular dynamics (MD) 
simulation trajectories, the free energy difference may be expressed as: 

snap 

wAhN^nap being observed number of snapshots in macrostate A (5), kp being Boltzmann constant 
and T being the temperature. However, if a converged MD trajectory set was generated for the sole 
purpose of calculating free energy differences between interested macrostate pairs, all dynamical 
information contained would have been discarded. One question we sought to answer is that if 
there is a way to save computational cost used for generating dynamical information by designing 
a free energy calculation method without explicit utilization of trajectories. A rarely discussed 
fact is that each snapshot represents an implicit microscopic volume (termed conformer hereafter) 
in configurational space. More importantly, equation (Eq. (1)) implies that, in a set of converged 
trajectories, implicit conformers associated with snapshots have invariant statistical weight distri¬ 
bution (ISWD) across the whole configurational space (see Fig. Figure 1). Therefore, one way 
to answer our original question is to accomplish the following two tasks: i) to construct a set of 
configurational-space-filling ^ explicit conformers, with thermally accessible ones among which 
have the property of ISWD ( or a sufficiently good approximation of it), and ii) to design an ef¬ 
ficient method to count such conformers that are thermally accessible in given macrostates. To 
be concise, we use “explicit conformers with ISWD (ECISWD)” to represent “configurational- 
space-filling explicit conformers, with thermally accessible ones among which have the property 
of ISWD ( or a sufficiently good approximation of it)” hereafter. For two arbitrary macrostates 
A and B that have and (Note that both are functions of potential energy) thermally 
accessible conformers, denoting corresponding average statistical weight of conformers as w^ and 

*Let the volume of the whole configurational space of a A^-atom molecular system being V-},n, for a set of M 
conformers each has a non-overlapping volume v,(/ = 1,2,-jM), if = ViN, then this set of conformers are 

configurational-space-filling. 
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w®, the change of free energy between these two macrostates may be written as: 
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For ECISWD, therefore: 
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It was demonstrated that sequential Monte Carlo (SMC) in combination with importance sam¬ 
pling may rapidly count the number of explicit conformers that are thermally accessible. There¬ 
fore, the hinging issue is to construct a set of ECISWD. We set to address this issue and accompa¬ 
nying implications in this study. 


Hypothesis on ECISWD 

Conformers associated with MD snapshots are implicit with no information available for their 
shapes or sizes, we consequently may not directly learn from MD trajectories. One principal con¬ 
sideration for defining ECISWD is sufficient fineness since statistical weight of complex molecular 
systems are in general exponentially different for different macrostates,^ very coarse conformers 
are associated with the possibility that the heavist conformer in the statistically most dominant 
macrostate weighs more than the total of all other macrotates, hence rendering ISWD impossible. 
Better uniformity is another factor to consider for the same reason. It is noted that ISWD holds for 
each set of implicit conformers associated with snapshots of corresponding independent and con¬ 
verged MD trajectory set. Therefore, infinite number of ways exist for constructing sets of implicit 
conformers with ISWD for a given complex molecular systems. Based on this thought, we hy¬ 
pothesized that any set of sufficiently fine and uniform conformers should approximately have the 
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property of ISWD, and we may consequently define ECISWD through systematically increasing 
their fineness according to our convenience. 

This hypothesis is immediately disproved by a simple double well system shown in Fig. Fig¬ 
ure 2. With increasingly different AU between two wells A and B, regardless of the fineness for 
any uniformly defined conformers, the statistical weight distribution of which in two macrostates 
will be increasingly different. The only way to achieve sufficiently good approximate ISWD is to 
construct conformers that were properly weighted by U, the potential energy surface that we do not 
know a priori in a real complex molecular system. Nonetheless, complex molecular systems are 
very different from a double well system. As shown in Fig. Figure 2, if we divide macrostates A 
and B into and (e.g. = 1000,000 ) conformers, U is consistently higher in A than 

in B in terms of conformer average, and within each conformer U is essentially a constant. Such 
situation is unlikely, if ever possible, to occur in a complex molecular system. With large number 
of degrees of freedom (DOFs), tight packing and steep van der waals repulsive core of constitut¬ 
ing atoms, potential energy may vary significantly within a microscopic volume of configurational 
space. Therefore, we think that competitions among large number of DOFs may render construc¬ 
tion of ECISWD an achievable task, and the above mentioned hypothesis may well be valid for 
complex molecular systems. 

Sufficiently well-converged MD trajectory sets of specific molecular systems provide ideal test 
grounds for ISWD property of given explicit conformers based on the following two arguments. 
Firstly, trajectory sets are generated by known force fields, and therefore no convolution of force 
fields inaccuracy and experimental error exists as in the case of comparing computational results 
with experimental ones; Secondly, we may arbitrarily partition configurational space visited in a 
trajectory set, and a hypothesis tested for arbitrarily given partitions should remain true for the 
whole configurational space. This is an important logic since traversing configurational space for 
complex molecular systems is practically impossible. The symbolic equivalence between equation 
(Eq. (1)) and equation (Eq. (3)) suggests that for a set of ECISWD, if we assign each snapshot in a 
trajectory set to a corresponding conformer and utilizing equations (Eq. (1)) and (Eq. (3)) respec- 
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lively to calculate free energy changes for arbitrarily selected pairs of macrostates, differences in 
results caused by different conformer definitions ( between a given explicit conformer set and the 
implicit one associated with snapshots ) should decrease with increasing size of trajectory set and 
essentially disappear for a fully converged trajectory set, the reason is that free energy difference 
between two arbitrarily given macrostates does not depend on the way it is calculated. Conversely, 
if statistical weight distribution of a set of explicit conformers is widely different in different part of 
the configurational space, the corresponding differences in results would increase with increasing 
size of trajectory set and saturate for a fully converged trajectory set since the largest possible error 
is limited by the number of available snapshots in any trajectory sets that are not fully converged. 
Both complete disappearance of differences resulted from equations (Eq. (1)) and (Eq. (3)) for the 
case of ECISWD, and full saturation of differences resulted from these two equations for the case 
of explicit conformers without ISWD will be extremely difficult to observe for complex molecu¬ 
lar systems duo excessive amount of data needed. Nonetheless, the trend should be equivalently 
informative as long as the largest trajectory set is sufficiently well-converged. 

We chose lipid POPC to carry out such tests based on the fact that large MD trajectory sets 
are available for this molecule. Specifically, we firstly extracted MD trajectories of POPC from 
trajectories of M2 muscarinic acetylcholine receptor study.^ Three increasingly larger trajectory 
sets, TSAI, TSA2 and TSA3 were constructed with smaller trajectory sets being subsets of larger 
ones. Secondly, we defined four different sets of conformers, which were denoted as CONEl 
through CONP4 (see Pig. Pigure 3) respectively, with CONEl being the finest and CONP4 being 
the coarsest. Thirdly, we used backbone dihedrals as order parameters to construct macrostates 
through projection operations. Pinally, number of conformers (Nconf) were calculated for each 
macrostate of the given combination of trajectory set and definition of conformers (see Methods 
for details). 

With the above given definitions of conformers, macrostates and trajectory sets, we calculated 
AF for all pairs of macrostates on each combination of conformer definition and trajectory set ac¬ 
cording to equation (Eq. (1)) (denoted as M^snapshot) and equation (Eq. (3)) (denoted as M^conformer) 
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respectively, and their differences were denoted as dAF = AF^napshots — ^conformer, which essen¬ 
tially measures differences between our constructed set of explicit conformers and implicit con- 
formers associated with snapshots. Distributions of 5AF and cumulative probability density (CPD) 
of its absolute values for the four sets of explicit conformers (CONFl through CONF4) are shown 
in Fig. Figure 4. Firstly, for CONF2 through CONF4 (Fig. Figure 4b-d), distribution of dAF 
is significantly broader for larger trajectory set. Secondly, it is noted that the range of horizontal 
axis is widely different for these three sets of conformers (ranging from less than OAksT to a few 
hgT). For a given trajectory set, dramatically broader distribution of 8AF is observed for coarser 
conformer definitions. Correspondingly, CPD plots of dAF (Fig. Figure 4f-h) exhibit the extent of 
errors more directly. These observations match our expectation for coarse conformers that do not 
have sufficiently good approximation of ISWD. Finally and most importantly, for CONFl (Fig. 
Figure 4a), distribution of dAF is narrower for larger trajectory set, and is significantly narrower 
than that of all other conformers (Fig. Figure 4b-d), the CPD plot (Fig. Figure 4e) shows the 
differences among trajectory sets more clearly. Therefore, conformers in set CONFl match our 
expectation for ECISWD. The observation of the behavior for CONFl through CONF4 suggest 
that, as hypothesized, we may define a set of ECISWD through systematic increase of conformer 
fineness. Regarding the uniformity of conformers, we equally partitioned each torsional DOE into 
three torsional states since we have no better information a priori to divide otherwise. To test 
further the hypothesis that any sufficiently fine conformers should have similarly good approxima¬ 
tion of ISWD, we defined a few more different set of conformers with similar fineness to CONEl 
through CONE4 respectively, and similar observations were made (see Eig. Eigure 5). On different 
trajectory sets of POPC with similar size to TSAI through TSA3, similar observations were made 
(see Eig. ??fig:TSB). It is noted that regardless of conformer definition and trajectory set size, 
distributions of dAF is approximately symmetric with the mode at zero (Eig. Eigure 4a-d, Eig. SI 
a-d and Eig. S2 a-d), this is inevitable since selection of start and end macrostate is arbitrary and 
consistent in calculating both AF^napshot and AFconformer■ 

Eor coarser explicit conformers without ISWD, deviations from ISWD are expected to occur in 
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the heaviest macrostates, where larger probability for occurrence of excessively heavy conformers 
would cause uneven distribution of statistical weight. Again, such deviations are expected to be 
larger for larger trajectory sets (and eventually saturate for a fully converged trajectory set). To 
this end, we plotted —InNsnap vs —InNconf for all constructed macrostates in Fig. Figure 7 for 
CONFl and CONF4. Indeed, deviations occur for the heaviest macrostates and are larger for 
larger trajectory set for CONF4 (Fig. Figure 7b,d,f). Perfect scaling was observed for CONFl 
(Fig. Figure 7a,c,e) as expected. 


Conformational entropy based on ECISWD 

Typical molecular systems in chemical, materials and biological studies, when treated quantum 
mechanically, present intractable complexity. Classical (continuous) representation of atomic 
DOFs, however, presents an awkward situation for the definition of microstates and entropy.^ Cor¬ 
respondingly, density of states of classical systems may be determined only up to a multiplicative 
factor.^ The term “conformational entropy”, despite its widespread usage, has no well established 
definition available for major complex biomolecular systems. Explicit conformers with ISWD, 
despite its system dependence and the fact that infinite number of specific definitions exist for each 
given complex molecular systems, may be utilized as basic states for defining conformational en¬ 
tropy in an abstract and general sense for any complex molecular systems, and we explore this idea 
and its implications in this section. 

It is well established in the informational theory field ^ that for a given static distribution with 
well-defined basic states, entropy may be constructed by arbitrary division of the whole system 
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into M subparts. 
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with Pi, Pj and being properly normalized: 

i=N j=M k=kj 

£p, = l, = and £Pfc = l (j = l,2,---,M) (7) 

i=l 7=1 k=l 

S is the global informational entropy and Sjs (j = 1,2, • • •,M) are local informational entropies, it 
is noted that such division may be carried out recursively. We may similarly construct both local 
entropies of macrostates (say A and B) and global entropy for the given molecular system based on 
a set of explicit conformers: 


^Lf 
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Pj is the probability of the ith conformer in the global configurational space, p{q)j(^k) is the prob¬ 
ability of the j(A:)th conformer in macrostate A{B). Si is the intra-conformer entropy of the ith 
conformer in the global configurational space. is the intra-conformer entropy for the j(A:)th 
conformer in macrostate A{B). Again, Pi, pj and are properly normalized: 

i=N j^^conf ^^^conf 

= L Pj = ^ L = ^ ( 11 ) 

1=1 7=1 k=l 
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The first terms on the right hand side of equations (Eq. (8), Eq. (9) and Eq. (10)) describe distri¬ 
butions of conformer statistical weights within a macrostate or within the whole configurational 
space, and is referred to as “conformational entropy” (Sconf), the second terms are averages of the 
intra-conformer entropies of corresponding conformers and are denoted Si„t. We may rewrite 
and 5® in the following form: 


cA _ rA 1 oA 

conf ' ‘^int 
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With a simple algebraic manipulation shown below: 

Stnf = -kB £ Pj (Inpj - ) 

1=1 

Ntnf 

= kelnN^^^f -ks Y, PM^tnfPj (18) 

,/=i 

Conformational entropy of macrostate A is divided into two terms. The first term is the 

Boltzmann entropy (or ideal gas entropy, denoted as based on the number of conform¬ 
ers. The second term represents deviation from the Boltzmann entropy (denoted as It is 

the product of the Boltzmann constant and the Kullback-Eeibler divergence^ between the actual 
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probability distribution of conformers in macrostate A (p = {pi,P 2 r" tPn^ )) and the uniform 

^^conf 

distribution {unif{\,N'^^^j-}). may be rewritten as: 


M 4- ^ 

‘^conf ^Boltzmann ' '-'‘^conf 

(19) 

^Sconf = -kBDKLiv\\unif{l,N^^„f}) 

(20) 


Similarly, denote probability distribution of conformers in macrostate 5 as q = (^i , ^2, • ‘ ‘ j ) 

^^conf 

and the corresponding uniform distribution as unif{l,N^^^j-}, we have: 


oB = c® -i_ ^ c® 

'-’conf ‘-^Boltzmann ~ ^‘-^conf 

= -kBDKLici\\unif{l,N^^„f}) 


( 21 ) 

( 22 ) 


For ECISWD, if we denote the corresponding ISWD with a continuous probability density R, then 
p ^ R and q R. Denote the continuous uniform distribution as unif, we have: 


Sstnf ~ - ^s^a-l(R| |unif) 

(23) 

SSconf ~ - ^s^a-l(R| |unif) 

(24) 

dASZf = 8S^conf-^Stnf^0 

(25) 

AS™./ « 
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Note that (equation Eq. (26)) is equivalent to (equation Eq. (3)) except a mere dif¬ 

ference of a negative temperature factor. dAS^^^j- reflect the difference between two KE diver¬ 
gences, which correspond to distances between the statistical weight distribution of conformers 
in macrostate A{B) and the uniform distribution. The advantage of utilizing ECISWD for defin¬ 
ing conformational entropy is the generality by concealing system specific molecular structural 
information in specific definition of conformers. Additionally, when difference of conformational 
entropy is taken between two arbitrary macrostates, deviation of the unknown ISWD from the unt¬ 
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form distribution is cancelled and we need only to deal with the number of conformers. Based 
on the same logic as in the case of free energy analysis, with increasingly larger subsets of a suf¬ 
ficiently well-converged MD trajectory set, we expect to observe systematic decrease of SASconf 
calculated for arbitrarily defined macrostate pairs as long as ECISWD are basic states of confor¬ 
mational entropy. Conversely, we expect to observe systematic increase of dASconf when explicit 
conformers with widely variant statistical weight distributions are basic states of conformational 
entropy. To this end, we took the same trajectory sets, definition of conformers and macrostates 
as in the analysis of 5AF, and calculated corresponding SASconf = ^Sfonf ~ ^^conf based on 
equations (Eq. (20)) and (Eq. (22)) for each macrostate pair. Both distributions of SASconf and 
corresponding CPD of its absolute value were shown in Eig. Eigure 8. As expected, and consistent 
with free energy analysis as shown in Eig. Eigure 4, trend of 5 AS con f based on conformers in set 
CONEl (Eig. Eigure 8a,e) matches our expectation for that of ECISWD, while trends of dASconf 
based on conformers in sets CONE2 through CONE4 (Eig. Eigure 8bcd, fgh) match our expecta¬ 
tion for that of conformers with variant statistical weight distribution, with coarser conformers and 
larger trajectory sets correspond to wider distributions of dASconf- 

Entropy enthalpy compensation 

In canonical ensemble, we have: 

AF^^ = AU^^ - TAS^^ = AU^^ -T{S^-S^) (27) 

with AU^^ being the change of potential energy between the two macrostates A and B. Eet ASf^ = 
^fnt ~ ^fnt’ substitute equations Eq. (12), Eq. (15), Eq. (19), Eq. (21) and Eq. (25) into equation 
(Eq. (27)), we have: 
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While the derivation is carried out in canonical ensemble, it should be applicable for many isobaric- 
isothermal processes (e.g. many biomolecular systems under physiological conditions or routine 
experimental conditions) where change of the PV term is negligible. Note that equation Eq. (28) 
is the intriguing entropy-enthalpy compensation (EEC) phenomena (when the PV term is negligi¬ 
ble), which had long been an enigma, and has attracted a revival of interest due to its critical 
relevance in protein-ligand interactions. Careful statistical analysis confirm that EEC does 
exist to various extent in many protein-ligand interaction systems after experimental errors are ef¬ 
fectively removed.^® Eor a given molecular system, once we have constructed a set of ECISWD, 
equations (Eq. (3)) and (Eq. (28)) state that change of molecular interactions does not necessarily 
cause change of free energy, which depends on relative number of thermally accessible ECISWD 
in end macrostates, and local effects from change of molecular interactions will be cancelled al¬ 
most completely by corresponding change of average intra-conformer entropy. Note that correla¬ 
tion of neither signs nor magnitudes between and ASf^ is implied. Therefore, depending 

upon signs and magnitudes of and A5^f (we neglect the PV term here), this theory is 

compatible with molecular processes driven by enthalpy, entropy or both and various extent of ob¬ 
served EEC. When AS^^y^ ^ 0, perfect EEC would be observed; when AS^^y > 0 and AU > 0 (or 
A5^f > 0), a seemingly entropy driven (and a reverse entropy limited) process would be observed; 
when A5^^y > 0 and AU < 0 (or ASf^ < 0), depending upon the sign of AS"^^ = + ASf^ , a 

seemingly enthalpy or entropy-enthalpy jointly driven (and a reverse enthalpy or entropy-enthalpy 
jointly limited) process would be observed. The fundamental new perspective provided by equa¬ 
tions (Eq. (3), Eq. (26) and Eq. (28)) is that EEC is directly related to local redistribution of mi¬ 
crostates in configurational space, while change of free energy and conformational entropy reflect 
the collective thermal accessibility of relevant macrostates. System complexity is essential for 
construction of ECISWD as demonstrated by our initial discussions on the double well model. 
Consistently, robustness of approximations in equations (Eq. (3)) and (Eq. (26)) corresponds to the 
near-perfect cancellation of change of intra-conformer entropy and change of enthalpy as reflected 
by equations (Eq. (28)). Without sufficient number of complex and heterogeneous microstates 
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within each conformer, it is hard to imagine how such EEC occur. Along the same lines, a sim¬ 
ple Morse potential type of protein-ligand interaction model was not found to allow significant 
EEC.^^ Based on the widespread observation of strong EEC effect in many molecular systems, it 
was suggested that any attempt to calculate the change of free energy as a sum of its enthalpic 
and entropic contributions is likely to be unreliable. The proposed conformer counting strategy 
(equation Eq. (3)) implicitly utilizes EEC by completely avoiding direct calculation of AU and 
ASint, which is expensive and error prone. 

Conclusions 

In summary, we presented the idea that snapshots in a converged MD trajectory set map directly 
to implicit thermally accessible conformers with ISWD. Based on the thought that infinite number 
of ways exist for defining implicit conformers with ISWD for a given molecular system, we hy¬ 
pothesized that any sufficiently fine set of conformers should have sufficiently good approximate 
ISWD. This hypothesis, while being disproved by a double well potential, tested successfully on 
extensive MD trajectories of lipid POPC. We think that competition of many DDEs, each allowed 
to vary significantly in both potential energy and spatial position within a conformer, constitutes 
the foundation for the observed validity of the hypothesis. Considering the moderate complexity 
of lipid POPC, it is likely that the hypothesis holds for complex molecular systems in general. 
This is a useful demonstration of the idea that “More is different”. Active research is undergo¬ 
ing in our group toward defining ECISWD for more biomolecular systems (e.g. protein-ligand, 
protein-protein interaction and protein-nucleic acid interactions systems with explicit or implicit 
solvation). Eurthermore, when ECISWD are utilized as basic states for definition of conformational 
entropy, change of which between two macrostates was found to be equivalent with correspond¬ 
ing change of free energy except a mere difference of a negative temperature factor. Meanwhile, 
change of potential energy between two macrostates was found to cancel corresponding change of 
average intra-conformer entropy. This finding suggests that EEC is inherently a local phenomenon 
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in configurational space, and is likely universal in complex molecular systems. While providing 
an alternative perspective to the long-standing enigmatic EEC, this result is consistent with differ¬ 
ent extent of EEC observed for both enthalpy driven and entropy driven molecular processes in 
conventional sense where change of enthalpy is compared with change of total entropy. Counting 
thermally accessible ECISWD (equation Eq. (3)) is a natural extension of the population based 
free energy formula (equation Eq. (1)), which is only useful posterior to a converged simulation. 
However, equation Eq. (3) effectively utilizes EEC implicitly through separation of entropy into 
conformational entropy based on ECISWD and intra-conformer entropy, and renders direct utiliza¬ 
tion of SMC and importance sampling possible for rapid free energy difference estimation. In 
accordance with “no free lunch theorem”, this expected gain in efficiency pays the price of all 
dynamical and pathway information associated with converged trajectories. 

Methods 

To define conformers, we first take a given set of torsional DOEs (Eig. Eigure 3), with each being 
divided into three equally sized torsional states with boundaries at 0°(360°), 120° and 240°, and 
subsequently utilize their unique combinations as conformers. Two structural states (i.e. snapshots) 
of a POPC molecule belong to the same conformer if and only if they share the same torsional 
state for each selected torsional DOE Apparently, infinite number of ways exist to define set of 
conformers with similar fineness and uniformity. 

To prepare macrostates, all snapshots in a given trajectory set were projected onto a selected 
backbone dihedral that was partitioned into 20 18°-windows, snapshots fall within each of which 
constitute an observed macrostate. Such projections were performed for each of 43 dihedrals 
(Eig. Eigure 3) and we have collectively 860 macrostates for each given combination of trajectory 
set and conformer definition. Apparently, macrostates based on the same dihedral angle do not 
overlap, while those based on different dihedral angles may overlap to different extent. To assign 
each snapshots to its belonging conformer and calculate Nconf for each constructed macrostates. 
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torsional states for the selected torsional DOFs were encoded into bit vectors and the radix sort 
algorithmwas utilized. 

Trajectory sets TSAI, TSA2 and TSA3 are constructed from snapshots of POPC collected 
in simulation condition A in the supplementary table 2 of the GPCR simulation study.There 
were totally 34143653 snapshots, which collectively amount to ~ 6.15m5 (6.14585754m5). Five 
subsets, with collective length (CL) being ~ 1 .5Sms, ~ 1 .32ms, ~ 1 .32ms, ~ 1 .32ms and ~ 0.66ms 
respectively, were available for this simulation condition. We take the first six trajectories out of 
the total 66 trajectories of the first subset as TSAI, which has a CL of 142.56/15. The first subset 
(~ L58m5) was taken as TSA2, and the union of all subsets was taken as TSA3 (~ 6.15m5). 
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Table 1: Detailed list of eomprising atoms of the 43 torsions utilized in defining eonformers and 
maerostates for POPC. 


Index 

atoml 

atom2 

atom3 

atom4 

Index 

atoml 

atom2 

atom3 

atom4 

1 

C12 

N 

Cll 

C15 

2 

N 

Cll 

C15 

01 

3 

Cll 

C15 

01 

PI 

4 

CI5 

01 

PI 

02 

5 

01 

PI 

02 

Cl 

6 

PI 

02 

Cl 

C2 

7 

02 

Cl 

C2 

021 

8 

Cl 

C2 

021 

C21 

9 

C2 

021 

C21 

C22 

10 

02 

Cl 

C2 

C3 

11 

Cl 

C2 

C3 

031 

12 

C2 

C3 

031 

C31 

13 

C3 

031 

C31 

C32 

14 

021 

C21 

Cll 

C23 

15 

C21 

C22 

C23 

C24 

16 

C22 

C23 

CIA 

C25 

17 

C23 

C24 

C25 

C26 

18 

C24 

C25 

C26 

Cll 

19 

C25 

C26 

Cll 

C28 

20 

C26 

Cll 

C28 

C29 

21 

C27 

C28 

C29 

C210 

22 

C28 

C29 

C210 

cm 

23 

C29 

C210 

C211 

C212 

24 

C210 

C211 

C212 

C213 

25 

C211 

C212 

C213 

C214 

26 

C212 

C213 

C214 

C215 

27 

C213 

C214 

C215 

C216 

28 

C214 

C215 

C216 

C217 

29 

C215 

C216 

C217 

C218 

30 

031 

C31 

C32 

C33 

31 

C31 

C32 

C33 

C34 

32 

C32 

C33 

C34 

C35 

33 

C33 

C34 

C35 

C36 

34 

C34 

C35 

C36 

Cll 

35 

C35 

C36 

C37 

C38 

36 

C36 

Cll 

C38 

C19 

37 

C37 

C38 

C39 

C310 

38 

C38 

C39 

C310 

C311 

39 

C39 

C310 

C311 

C312 

40 

C310 

C311 

C312 

C313 

41 

C311 

C312 

C313 

C314 

42 

C312 

C313 

C314 

C315 

43 

C313 

C314 

C315 

C316 
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Figure 1: A schematic illustration of the ISWD property in two dimension for implicit conformers 
associated with snapshots in a converged MD trajectory set. Red points represent snapshots, corre¬ 
sponding dashed squares represent associated implicit conformers with darker grayscale indicating 
heavier statistical weight. With shown variant statistical weight distribution of implicit conformers 

in the vertical direction, = Zn-r^^Cleft), while ^ (right). As long as variation 

^'snap ^^snap 

of statistical weight distribution exist, we may always find a pair of macrostates like C and D. 
Therefore, robustness of the population based free energy formula (equation Eq. (1)) is equivalent 
to the ISWD property for the corresponding set of implicit conformers. 
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Figure 3: Ball and stick representation of POPC and definition of conformer sets. Oxygen: red, hy¬ 
drogen: white, carbon: cyan, phosphate: blue. The 43 all-heavy-atom torsions (see Table Table 1 
for detailed lists of comprising atoms) utilized to define conformers are labeled with numbers on 
their central bonds. Set CONFl is defined with all 43 torsions; set CONF2 is defined by 28 torsions, 
which are {2,3,5,6,8,9,11,12,14,15,17,18,20,21,23,24,26,27,29,30,32,33,35,36,38,39,41,42}; set 
CONF3 is defined by 22 odd numbered torsions and set CONF4 is defined by 15 torsions that 
are excluded in the definition of CONFl. 



Figure 4: Distributions of dAF (a - d) and CPD of its absolute values (e - h) for POPC with four 
sets of explicit conformers (CONFl through CONF4, which are indicated in the horizontal label 
as subscripts, e.g. dAFconfi in (a) and \ dAFconfi \ in (e)). Different trajectory sets are represented 
by different line colors. The unit of the horizontal axis is in hgT. 
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Figure 5: Distributions of dAF (a - d) and CPD of its absolute values (e - h) for POPC with 
conformer sets CONF5 through CONF8, which are defined similarly to CONFl through CONF4 
except that torsional states boundaries are 60°, 180° and 300°. Different trajectory sets are repre¬ 
sented by different line colors. The unit of the horizontal axis is in ksT. 




Figure 6: Distributions of dAF (a - d) and CPD of its absolute values (e - h) for POPC with con- 
former sets CONFl through CONF4 on trajectory sets TSBl through TSB3. These trajectory sets 
are constructed from snapshots of POPC collected in simulation condition B in the supplemen¬ 
tary table 2.^ There were 36724760 snapshots, which collectively amount to a CTS of ~ 6.6\ms 
(6.6104568m5). Five subsets, each including 56 trajectories with CTS being ~ 132ms, were avail¬ 
able for this simulation condition. After trajectories of the first subset were sorted according to file 
name, the first six trajectories were taken as TSBl (~ 200ils). The first subset is taken as TSB2 
(~ 132ms), and the union of all subsets was taken as TSB3 (~ 6.61ms). Different trajectory sets 
are represented by different line colors. The unit of the horizontal axis is in kgT. 
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Figure 7: The —InNsnap — InNconf plots for CONFl (left) and CONF4 (right) on the three 
trajectory sets. Blue lines represent situations where equation (Eq. (3)) holds sufficiently well. 
Each red dot represents a macrostate; green lines are the best linear fits for the observed data with 
being the squared linear correlation coefficients. 


24 














Figure 8: Distributions of dASconf (a - d) and CPD of its absolute values (e - h) for POPC with four 
sets of explicit conformers (CONFl through CONF4, which are indicated in the horizontal label 
as subscripts, e.g. SASconfi in (a) and \ 5ASconfi \ in (e)). Different trajectory sets are represented 
by different line colors. The unit of the horizontal axis is in kg. 
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